Noise-Clustered Distant Supervision for Relation Extraction: A Nonparametric Bayesian Perspective
نویسندگان
چکیده
For the task of relation extraction, distant supervision is an efficient approach to generate labeled data by aligning knowledge base with free texts. The essence of it is a challenging incomplete multi-label classification problem with sparse and noisy features. To address the challenge, this work presents a novel nonparametric Bayesian formulation for the task. Experiment results show substantially higher top-precision improvements over the traditional state-of-the-art approaches.
منابع مشابه
Learning with Noise: Enhance Distantly Supervised Relation Extraction with Dynamic Transition Matrix
Distant supervision significantly reduces human efforts in building training data for many classification tasks. While promising, this technique often introduces noise to the generated training data, which can severely affect the model performance. In this paper, we take a deep look at the application of distant supervision in relation extraction. We show that the dynamic transition matrix can ...
متن کاملPrior-informed Distant Supervision for Temporal Evidence Classification
Temporal evidence classification, i.e., finding associations between temporal expressions and relations expressed in text, is an important part of temporal relation extraction. To capture the variations found in this setting, we employ a distant supervision approach, modeling the task as multi-class text classification. There are two main challenges with distant supervision: (1) noise generated...
متن کاملRelation Extraction from the Web Using Distant Supervision
Extracting information from Web pages requires the ability to work at Web scale in terms of the number of documents, the number of domains and domain complexity. Recent approaches have used existing knowledge bases to learn to extract information with promising results. In this paper we propose the use of distant supervision for relation extraction from the Web. Distant supervision is a method ...
متن کاملRemoving Noisy Mentions for Distant Supervision Eliminando Menciones Ruidosas para la Supervisión a Distancia
Relation Extraction methods based on Distant Supervision rely on true tuples to retrieve noisy mentions, which are then used to train traditional supervised relation extraction methods. In this paper we analyze the sources of noise in the mentions, and explore simple methods to filter out noisy mentions. The results show that a combination of mention frequency cut-off, Pointwise Mutual Informat...
متن کاملCombining Generative and Discriminative Model Scores for Distant Supervision
Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner with that of a generative hierarchical topic model to reduce the noise in distant supervision data. The combination significantly increases the ranking quality of extracted facts an...
متن کامل